detection model
Appendix 545 A Details of datasets and architectures 546 A.1 Object Detection Image Dataset
We evaluate our method on three well-known model architectures:, i.e., SSD [ Named Entity Recognition, and Question Answering. Find more details in Table 5. Recall, ROC-AUC, and Average Scanning Overheads for each model. A value of 1 indicates perfect classification, while a value of 0.5 indicates To the best of our knowledge, there is no existing detection methods for object detection models. We evaluate the IoU threshold used to calculate the ASR of inverted triggers. However, a threshold of 0.7 tends to degrade the Different score thresholds are tested when computing the ASR of inverted triggers.
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.05)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.05)
- Asia > Nepal (0.04)
- (5 more...)
DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion
The rapid progress of Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content. Existing methods often struggle to generalize to unseen domains due to the diverse nature of facial manipulations. In this paper, we revisit the generation process and identify a universal principle: Deepfake images inherently contain information from both source and target identities, while genuine faces maintain a consistent identity. Building upon this insight, we introduce DiffusionFake, a novel plug-and-play framework that reverses the generative process of face forgeries to enhance the generalization of detection models. DiffusionFake achieves this by injecting the features extracted by the detection model into a frozen pre-trained Stable Diffusion model, compelling it to reconstruct the corresponding target and source images. This guided reconstruction process constrains the detection network to capture the source and target related features to facilitate the reconstruction, thereby learning rich and disentangled representations that are more resilient to unseen forgeries. Extensive experiments demonstrate that DiffusionFake significantly improves cross-domain generalization of various detector architectures without introducing additional parameters during inference.
Object DGCNN: 3D Object Detection using Dynamic Graphs
Inspired by recent non-maximum suppression-free 2D object detection models, we propose a 3D object detection architecture on point clouds. Our method models 3D object detection as message passing on a dynamic graph, generalizing the DGCNN framework to predict a set of objects. In our construction, we remove the necessity of post-processing via object confidence aggregation or non-maximum suppression. To facilitate object detection from sparse point clouds, we also propose a set-to-set distillation approach customized to 3D detection.
Few-Cost Salient Object Detection with Adversarial-Paced Learning
Detecting and segmenting salient objects from given image scenes has received great attention in recent years. A fundamental challenge in training the existing deep saliency detection models is the requirement of large amounts of annotated data. While gathering large quantities of training data becomes cheap and easy, annotating the data is an expensive process in terms of time, labor and human expertise. To address this problem, this paper proposes to learn the effective salient object detection model based on the manual annotation on a few training images only, thus dramatically alleviating human labor in training models. To this end, we name this new task as the few-cost salient object detection and propose an adversarial-paced learning (APL)-based framework to facilitate the few-cost learning scenario. Essentially, APL is derived from the self-paced learning (SPL) regime but it infers the robust learning pace through the data-driven adversarial learning mechanism rather than the heuristic design of the learning regularizer. Comprehensive experiments on four widely-used benchmark datasets have demonstrated that the proposed approach can effectively approach to the existing supervised deep salient object detection models with only 1k human-annotated training images.
SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation
Dong, Yuyang, Ueda, Nobuhiro, Boros, Krisztián, Ito, Daiki, Sera, Takuya, Oyamada, Masafumi
With the increasing adoption of Large Language Models (LLMs) and Vision-Language Models (VLMs), rich document analysis technologies for applications like Retrieval-Augmented Generation (RAG) and visual RAG are gaining significant attention. Recent research indicates that using VLMs yields better RAG performance, but processing rich documents remains a challenge since a single page contains large amounts of information. In this paper, we present SCAN (SemantiC Document Layout ANalysis), a novel approach that enhances both textual and visual Retrieval-Augmented Generation (RAG) systems that work with visually rich documents. It is a VLM-friendly approach that identifies document components with appropriate semantic granularity, balancing context preservation with processing efficiency. SCAN uses a coarse-grained semantic approach that divides documents into coherent regions covering contiguous components. We trained the SCAN model by fine-tuning object detection models on an annotated dataset. Our experimental results across English and Japanese datasets demonstrate that applying SCAN improves end-to-end textual RAG performance by up to 9.4 points and visual RAG performance by up to 10.4 points, outperforming conventional approaches and even commercial document processing solutions.
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
ShadowWolf -- Automatic Labelling, Evaluation and Model Training Optimised for Camera Trap Wildlife Images
The continuous growth of the global human population is leading to the expansion of human habitats, resulting in decreasing wildlife spaces and increasing human-wildlife interactions. These interactions can range from minor disturbances, such as raccoons in urban waste bins, to more severe consequences, including species extinction. As a result, the monitoring of wildlife is gaining significance in various contexts. Artificial intelligence (AI) offers a solution by automating the recognition of animals in images and videos, thereby reducing the manual effort required for wildlife monitoring. Traditional AI training involves three main stages: image collection, labelling, and model training. However, the variability, for example, in the landscape (e.g., mountains, open fields, forests), weather (e.g., rain, fog, sunshine), lighting (e.g., day, night), and camera-animal distances presents significant challenges to model robustness and adaptability in real-world scenarios. In this work, we propose a unified framework, called ShadowWolf, designed to address these challenges by integrating and optimizing the stages of AI model training and evaluation. The proposed framework enables dynamic model retraining to adjust to changes in environmental conditions and application requirements, thereby reducing labelling efforts and allowing for on-site model adaptation. This adaptive and unified approach enhances the accuracy and efficiency of wildlife monitoring systems, promoting more effective and scalable conservation efforts.
- Europe > Germany > Bremen > Bremen (0.28)
- North America (0.14)
- Europe > Spain (0.04)
- (2 more...)
- Workflow (1.00)
- Research Report (0.82)
- Overview (0.67)
- Information Technology (0.67)
- Energy (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Exposing Pink Slime Journalism: Linguistic Signatures and Robust Detection Against LLM-Generated Threats
Shahriar, Sadat, Ayoobi, Navid, Mukherjee, Arjun, Musharrat, Mostafa, Vamsi, Sai Vishnu
The local news landscape, a vital source of reliable information for 28 million Americans, faces a growing threat from Pink Slime Journalism, a low-quality, auto-generated articles that mimic legitimate local reporting. Detecting these deceptive articles requires a fine-grained analysis of their linguistic, stylistic, and lexical characteristics. In this work, we conduct a comprehensive study to uncover the distinguishing patterns of Pink Slime content and propose detection strategies based on these insights. Beyond traditional generation methods, we highlight a new adversarial vector: modifications through large language models (LLMs). Our findings reveal that even consumer-accessible LLMs can significantly undermine existing detection systems, reducing their performance by up to 40% in F1-score. To counter this threat, we introduce a robust learning framework specifically designed to resist LLM-based adversarial attacks and adapt to the evolving landscape of automated pink slime journalism, and showed and improvement by up to 27%.
- North America > United States > Texas > Harris County > Houston (0.14)
- North America > Canada (0.04)
- Europe > France (0.04)
FOD-S2R: A FOD Dataset for Sim2Real Transfer Learning based Object Detection
Vashist, Ashish, Saadiyean, Qiranul, Sundaram, Suresh, Seelamantula, Chandra Sekhar
Foreign Object Debris (FOD) within aircraft fuel tanks presents critical safety hazards including fuel contamination, system malfunctions, and increased maintenance costs. Despite the severity of these risks, there is a notable lack of dedicated datasets for the complex, enclosed environments found inside fuel tanks. To bridge this gap, we present a novel dataset, FOD-S2R, composed of real and synthetic images of the FOD within a simulated aircraft fuel tank. Unlike existing datasets that focus on external or open-air environments, our dataset is the first to systematically evaluate the effectiveness of synthetic data in enhancing the real-world FOD detection performance in confined, closed structures. The real-world subset consists of 3,114 high-resolution HD images captured in a controlled fuel tank replica, while the synthetic subset includes 3,137 images generated using Unreal Engine. The dataset is composed of various Field of views (FOV), object distances, lighting conditions, color, and object size. Prior research has demonstrated that synthetic data can reduce reliance on extensive real-world annotations and improve the generalizability of vision models. Thus, we benchmark several state-of-the-art object detection models and demonstrate that introducing synthetic data improves the detection accuracy and generalization to real-world conditions. These experiments demonstrate the effectiveness of synthetic data in enhancing the model performance and narrowing the Sim2Real gap, providing a valuable foundation for developing automated FOD detection systems for aviation maintenance.
- Asia > India > Karnataka > Bengaluru (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
Analysis of Incursive Breast Cancer in Mammograms Using YOLO, Explainability, and Domain Adaptation
Adhikari, Jayan, Joshi, Prativa, Baral, Susish
Abstract--Deep learning models for breast cancer detection from mammographic images have significant reliability problems when presented with Out-of-Distribution (OOD) inputs such as other imaging modalities (CT, MRI, X-ray) or equipment variations, leading to unreliable detection and misdiagnosis. Our strategy establishes an in-domain gallery via cosine similarity to rigidly reject non-mammographic inputs prior to processing, ensuring that only domain-associated images supply the detection pipeline. The OOD detection component achieves 99.77% general accuracy with immaculate 100% accuracy on OOD test sets, effectively eliminating irrelevant imaging modalities. ResNet50 was selected as the optimum backbone after 12 CNN architecture searches. The joint framework unites OOD robustness with high detection performance (mAP@0.5: Experimental validation establishes that OOD filtering significantly improves system reliability by preventing false alarms on out-of-distribution inputs while maintaining higher detection accuracy on mammographic data. The present study offers a fundamental foundation for the deployment of reliable AI-based breast cancer detection systems in diverse clinical environments with inherent data heterogeneity. A global health concern, breast cancer is the second-highest cause of cancer related to mortality in women. It has been recorded as the most diagnosed disease in the world in 2020 [1]. According to the World Health Organization, all types of cancer account for 626700 global deaths of women, out of which the breast is the predominant and second leading cause [2]. If diagnosed in its early development stage, the survival rate are likely to be high and the treatment cost will get reduced [3]. Studies has found that 30% breast cancer are diagnosed when the size of the mass is 30mm.
- North America > United States (0.04)
- Oceania > New Zealand (0.04)
- Europe > Portugal (0.04)
- (2 more...)